-
Notifications
You must be signed in to change notification settings - Fork 0
WebAssembly Branch Hinting? #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think that's possible in principle. In assembler terms you can have a label attached to the function start and to each instruction, and you can express the difference between them as an assembler expression. Actually I think that means that you don't even need a relocation, the label difference can be calculated at assembly time. If we wanted to lower the branch weight metadata into wasm custom section metadata at the LLVM level, then we'd need to make the section payload be something other than an LLVM string constant; it would have to be something that symbolically links to the function or instruction. I guess this must be possible somehow, as debug info does something like that; but it might be a special case. But we'd need something that would survive backend passes reordering the blocks and such. It might make more sense to do something at IR lowering time instead, and attach the data somehow to the MachineInstr representing the branch. Actually come to think of it, other targets must do something like this too, right? Since the passes that use the branch weights would be backend passes like block placement. Maybe there's some common code we can use for that, that we can use at the end of the backend pipeline when we go from MachineInstr IR to MC. |
This does sounds like something that would need to a relocation, at least for the function index. As derek said the offset from the begining of the function can likely be calculated by the assembly so should not need a relocation (i.e. the linker cannot change this since it never touches the bytes of a function). The current relocation we have are listed in |
Makes sense. I got stuck finding where the IR metadata gets converted into the machine layer, but I did find how things work there:
For the IR option here, we'd need the usual function index LEB, which I am guessing Overall it seems like we could do this either
I'll look into this some more, but any advice between these two very different paths would be welcome, given I'm not familiar with LLVM much. |
I think the latter would be most straightforward. If you look in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp you can see where we create custom sections for function attributes ( You'll end up with asm code that switches back and forth between the code and metadata sections, emitting bytes incrementally to the latter (just as each instruction emits incrementally to the code section). The assembler will combine these into one section in the object file, and then the linker will combine them into one section in the output. We just need to figure out how get the header correct (i.e. the length of the vector of functions)? The other option (instead of incrementally emitting the custom section during instruction emission) could be to emit the whole function's worth of the section at once at the end of the function (or even at the end of the object file). That could work if we can't know ahead of time how many annotations each function has, but I think we'd still need to make sure we emit labels for each branch (during instruction emission) to calculate the offsets, and we'd still need to figure out how to get things right after linking. |
Thanks, I think I wrapped my head around some AsmPrinter basics, and got it to emit an empty branch hint section... next, the real 99% |
Yes, at some point I noticed that diff ended up as absolute, I guess since the two symbols are in the same section. So all the problem here is really just about the relocatable function symbol index. |
if (Value->evaluateAsAbsolute(IntValue, getAssemblerPtr())) { | ||
emitULEB128IntValue(IntValue); | ||
return; | ||
} | ||
insert(getContext().allocFragment<MCLEBFragment>(*Value, false)); | ||
|
||
if (!PadTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be even more specific here, the thing I was hoping for was that we could tell, based on some property of Value
, whether there needs to be a relocation. If so, we can basically just assume that the size will be 5 (The only non-32-bit relocation I know of would be for memory addresses, so there could be i64.const instruction in code with 9-byte relocatable LEBs, but I guess those are handled differently elsewhere). And then we don't need the explicit PadTo in the streamer API or the separate assembler directive.
The question of course, is whether that's possible. MCExpr has evaluateAsRelocatable
alongside evaluateAsAbsolute
but currently I'm not totally sure what it means.
@@ -609,6 +654,65 @@ void WebAssemblyAsmPrinter::emitJumpTableInfo() { | |||
// Nothing to do; jump tables are incorporated into the instruction stream. | |||
} | |||
|
|||
std::optional<bool> WebAssemblyAsmPrinter::getBranchHint(const MachineInstr& MI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aheejin This is the code I mentioned in the meeting, PTAL if you have time. Maybe there is a better way to do this?
I added some comments here and in the calling code that hopefully explain why it works as it does. At a high level:
- During
emitInstruction
we see if we need a hint on a br_if. We do it at that time, so that we can emit a Label if we want a hint (we need the label to refer to the br_if from the custom section). That is after CFGStackify. At this stage we just emit the Labels and stash the information. - After emitting all functions, we gather the stashed information, and emit the final section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a better way if we do this after CFGStackify, and this is roughly what I'd do as well. But this code may not handle try
-catch
and exceptions.
When EH is enabled, one of a BB's successor can be an unwind destination, so in theory a BB can have three successors. I don't think we generate that kind of code, but I think it is safe to check it and exclude it. (by calling MachineBasicBlock::isEHPad()
or something)
And with try
-catch
the fallthrough block may not be the next block. For example:
try
...
br_if ...
catch ;; br_if falls through to NOT here
...
end
;; br_if falls through to here
In this case both the true and false destinations will be successors but it may be tricky to figure out which one is which... To figure this out we may need to keep track of the control flow stack so we can compute the (true) destination of the branch? Which sounds like an overkill. This kind of fallthrough is generated by this optimization which may not be critical, meaning if someone wants performance they would've run binaryen anyway, so if this becomes a problem we can consider removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, what does an unwind target from a BB that ends in a br_if mean? That there is a call in the middle of the BB, and it threw? (Do such calls not need to be terminators of BBs? Sorry, I don't know much about wasm backend control flow...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, I think there can't really be a case with three successors (two branch successors and an unwind successor). Because, so when a BB has a call that can throw and has an unwind BB successor, before CFGStackify it would look like
...
bb:
;; successor: ehpad, cont
call $maythrow
br $cont
ehpad:
catch ...
cont:
...
And we remove that br $cont
in CFGStackify because it is not necessary. In this case there's no chance that bb
will end with br_if
. Also this is not feasible:
bb:
;; successor: ehpad, somewhere, cont
call $maythrow
br_if $somewhere
br $cont
ehpad:
Because a BB with a throwing call (and has an unwind successor) wouldn't have a br_if
after the throwing call.
What I can't guarantee we can't have is the case of try
block's last BB that doesn't have an unwind dest BB. Because we can wrap many throwing calls within a big try
-catch
, the last BB within a try
part may not be a BB with an unwind destination. For example:
bb0:
;; successor: ehpad, cont
call $maythrow
br $cont
bb1:
;; successor: ehpad, cont
call $maythrow
br $cont
bb2:
;; successor: somewhere, cont
... something else ...
br_if $somewhere
br $cont
ehpad:
...
cont:
In this case, if br $somewhere_else
is removed, we don't have the info on bb2
's fallthrough block. This situation is pretty contrived and it may not happen in reality. I just can't guarantee it won't. But given that this branch hint is afer all a 'hint', we may not need to consider this case. When seeing a br_if
, just checking if the fallthrough (= next) block is an EH pad, and if so, not annotating that branch can be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About
In this case, if
br $somewhere_else
is removed
Was this a typo of br $cont
, or br_if $somewhere
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware about terminator lists until you wrote this comment... interesting.
PTAL at the commit where I just tried to handle the corner cases you mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About
In this case, if
br $somewhere_else
is removedWas this a typo of
br $cont
, orbr_if $somewhere
?
Sorry, br $cont
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments below. Please ignore my first comment (which I've now deleted); I was confused and the comment was incorrect.
// the parent block of this BR_IF. (If there are three successors, due to some | ||
// terminator before us, give up on this hint.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about due to an additional unwind successor
or ... EH pad successor
?
// It must be one of the successors. | ||
assert(FalseDest == MBB1 || FalseDest == MBB2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion does not hold if we fall through to an EH pad
// Stop if the fallthrough is an EH pad, because that could happen like this: | ||
// | ||
// bb1: | ||
// ;; successor: if.true, otherwise | ||
// .. | ||
// br_if $if.true | ||
// br $otherwise | ||
// | ||
// ehpad: | ||
// | ||
// If that br $otherwise is optimized out, it would look like we fall through | ||
// to the ehpad (since it is the block physically after us), but in wasm's | ||
// structured control flow that is not the case. Rather than try to find the | ||
// true fallthrough, give up on a hint in this rare case. | ||
if (FalseDest->isEHPad()) | ||
return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can have another corner case too... For example
bb0:
...
br_if
bb1:
delegate
ehpad:
catch
So this delegate
here would have been inserted after we first place try
-catch
markers to fix unwind mismatches. We add delegate
in a separate BB because it is also a terminator. In this case the fallthrough BB is not an EH pad but we still shouldn't fall through to the next BB.
I think a safer solution could be... Just bail out when one of the ParentBB's successor is not the next BB in the layout order. I think this can handle both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that is much simpler and makes perfect sense. The last commit includes that and other fixes from your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this part looks good to me now (I haven't yet looked at other parts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't look carefully at the other parts yet 😆 There are some bad hacks, and I am still debugging the case of multiple files...
I haven't checked the code yet, but this PR was uploaded yesterday: llvm#146230 |
Thanks! Good to know... I'll take a closer look Monday, but looks like that PR is in better shape than mine, none of the horrible hacks at least... so I imagine the LLVM side of my work won't be needed. Funny timing, I just got multi-file hinting working Friday and the first full benchmarks up and running 😆 |
That other PR definitely looks like the way to go in LLVM, compared to this, so I'll close this one... Last update on my work in this area, the code here is hackish but fully functional, though it does need
Somewhat disappointing, but I did confirm that making an artificial benchmark with a 30% benefit is not difficult: https://gist.github.com/kripken/09ad895764cd6cd3b7a8a6c13c643c93 I'll wait for that other PR to land in LLVM, and meanwhile am working on wasm-opt optimizer support for branch hints (properly updating the hint when we flip an if, for example). |
Branch hint support (or support for code metadata in general) in wasm-opt would be great! Currently, I have to hard-disable wasm-opt and meta-dce in emscripten since (as you said) it invalidates branch hints. I'll respond to your other points about the performance under the other PR. Sorry for noticing this PR here so late, it didn't appear in any searches :/ |
No worries, maybe I should have opened it on LLVM, but it began as more of a question/experiment... Anyhow, what matters is we get good support in LLVM, and your PR looks right for that. |
@dschuff @sbc100 This is an early beginning of an attempt to support branch hints in LLVM, but I honestly do not really know what I am doing 😄 Maybe you can tell me if I am even in the right direction.
Background: This is an example of an LLVM branch weights hint:
That br has
42/(42+1337)
(estimated) chance to go to the first target.Reading stuff in the wasm backend and lld, it seems like there is a generic mechanism for generating custom sections in LLVM IR, compiling them, and linking them, so I was wondering if appending something like this could work:
It looks like llc emits a custom section from that. So "all" I need is to replace the nonsense bytes with the proper info.
The proper info should somehow refer to the instruction. I am guessing a relocation is needed..? We'd need one both for the function and the offset of the instruction in the function. Is that possible, or should I be doing something entirely different?
(The code in this PR adds an IR pass to the wasm backend, and successfully reads
branch_weights
. It does not emit anything yet.)